Triplet repeats in human genome: distribution and their association with genes and other genomic regions
نویسندگان
چکیده
MOTIVATION Simple sequence repeats (SSRs) or microsatellite repeats are found abundantly in many prokaryotic and eukaryotic genomes. Among SSRs, triplet repeats are of special significance because some of them have been linked to various genetic disorders. The objective of the study is to analyze the triplet repeats of complete human genome and to identify the genes that contain the triplet repeats in their coding region. The analysis will help us to identify the candidate genes that have potential for repeat expansion. RESULTS We have analyzed triplet repeats in the complete human genome from the publicly available sequences. Our analysis revealed that AGC and CCG repeat were predominantly present in the coding regions of the genome while UTRs and the upstream sequences contained CCG repeats in relative abundance. Analysis of density of triplet repeats (bp/Mb) revealed that AAT and AAC were the abundant repeats whereas ACT and ACG were the rare repeats found in human genome. We could identify about 2135 known or predicted genes that were associated with at least one of the triplet repeat types. A large proportion of putative transcripts that were identified by gene finding programs were found to be associated with triplet repeats. These transcripts will be the candidate genes for analysis of triplet repeat expansion and a possible association with disease phenotypes. Identification of 171 genes which contain a minimum of ten repeat units will be of particular interest in future in correlating their association with any disease phenotype due to the expansion potential of repeats present in them. The list of genes and other details of analysis are given in the online supplementary data (http://www.ingenovis.com/tripletrepeats).
منابع مشابه
Genome-wide Association Study to Identify Genes and Biological Pathways Associated with Type Traits in Cattle using Pathway Analysis
Extended Abstract Introduction and Objective: Type traits describing the skeletal characteristics of an animal are moderately to strongly genetically correlate with other economically important traits in cattle including fertility, longevity and carcass traits. The present study aimed to conduct a genome wide association studies (GWAS) based on gene-set enrichment analysis for identifying the ...
متن کاملSSRD: Simple Sequence Repeats Database of the Human Genome
Simple sequence repeats are predominantly found in most organisms. They play a major role in studies of genetic diversity, and are useful as diagnostic markers for many diseases. The simple sequence repeats database (SSRD) for the human genome was created for easy access to such repeats, for analysis, and to be used to understand their biological significance. The data includes the abundance an...
متن کاملPreference of simple sequence repeats in coding and non-coding regions of Arabidopsis thaliana
MOTIVATION Simple sequence repeats or microsatellites have been found abundantly in many genomes. However, the significance of distribution preference has not been completely understood. Completion of the Arabidopsis genome sequencing allows us to better understand and characterize microsatellites. RESULTS Microsatellite distribution was more abundant in 5'-flanking regions of genes compared ...
متن کاملComparative bioinformatics analysis of a wild diploid Gossypium with two cultivated allotetraploid species
Background: Gossypium thurberi is a wild diploid species that has been used to improve cultivated allotetraploid cotton. G. thurberi belongs to D genome, which is an important wild bio-source for the cotton breeding and genetic research. To a certain degree, chloroplast DNA sequence information are a versatile tool for species identification and phylogenetic implications in plants. Different ch...
متن کاملLong non-coding RNAs and their significance in human diseases
Protein-coding genes account for only a small fraction of the human genome and most of the genomic sequences are transcriptionally silent, but recent observations indicate significant functional elements, including non-coding protein transcripts in the human genome. Long non-coding RNAs (lncRNAs) have been defined as transcripts of >200 nucleotides without protein-coding capacity that perform t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 19 5 شماره
صفحات -
تاریخ انتشار 2003